Final Report for ZSY Playing
نویسندگان
چکیده
Introduction We aimed to train an agent through Reinforcement Learning to beat a human at the card game ZSY. The inputs are the game states at each particular turn and the outputs are the optimal move given the input game state. Below are the abbreviated game rules: 1. A game consists of a number of rounds and each round has a number of turns. Both players are dealt 18 random cards (their ‘hand’s) at the start. On the first round, one player is randomly selected to have the first turn. That player plays a pattern which can be one of the following: ‘single’ (a single card, e.g. 5), ‘double’ (two cards of the same rank, e.g. 99), ‘triple’ (three cards of the same rank, e.g. 777), ‘bomb’ (four cards of the same rank, e.g. 4444), or a ‘chain’ (consecutively ranked cards in which each ‘link’ is at least a double, e.g. 6677788 but not 66788 because there are fewer than 2 7s or 6688 because the ranks are not consecutive). 2. At each following turn, a player can play cards that match the pattern and have higher rank (e.g. 555 can be followed by 777 but not 88 or 444; 55666 can be followed by 77888 but not 77788 or 33444), play a bomb, or pass. Players alternate turns until one player passes, and the round terminates. The last player to play some cards in the round begins the next round and sets a new pattern. The game terminates when one player runs out of cards and wins the game. To model this game, we built a game class in python that deals cards, keeps track of the running card counts, and sends game state data to the appropriate agent class that return a move according to that agent’s policy. This was a joint project with CS 221, but in that class we approached a different version of the game in which both players could see the other’s cards. In that case, the game is theoretically deterministic from the perspective of the agents, and we tried to design a minimax agent rather than a learning agent. The only shared portion was the game class we wrote, modified to send full game state data (i.e. hands of both the agent and its opponent) to the agents for 221 rather than partial game state data (i.e. the hand of the agent and the number cards the opponent has left) for 229. We tried two RL algorithms: Q-learning and TD-learning. In the first, we tried to find the values for states coupled with actions (the move taken), and in the second we tried to find the values for just the states.
منابع مشابه
Role Playing Approach vs. Traditional Method about Neonatal Admission Skills among Midwifery Students
Introduction Since, employing new education approach is necessary for enhancing medical students` skills, so the aim of this study was to determine the effectiveness of role –play approach compared the traditional method about neonatal admission skills in delivery room among midwifery students. Materials and Methods This was an experimental study in 2013-2014 in Isfahan-Iran. After baseline tes...
متن کاملThe Comparison betweem Lecturing and Role Playing Teaching Styles in Organizational Behavior Course: A Study in Tehran University of Medical Sciences
Introduction: Educational system of any country should establish the appropriate status for the development of learners and flourish all attitudes of people to learn and educated. Therefore, this research was aimed to compare role playing and lecturing teaching styles through organizational behaviour among health care management students. Methods: This was a interventional-analytical research....
متن کاملEffects of e-learning, lectures, and role playing on nursing
Background :Nursing education can maintain its dynamic quality when it moves toward innovation and modern methods of teaching and learning. Therefore, teachers are required to employ up to date methods in their teaching plans. This study evaluated the effects of e-learning, lectures, and role playing on nursing students’ learning, retention, and satisfaction. Methods : Sixty nursing student...
متن کاملEvaluation of an evaluation
Introduction. Evaluation is a systematic way to improve and make more effective actions that involves procedures which are useful, feasible, ethical, and accurate. Common questions in all evaluations are: do all part of program do well and effective? What is the good functioning? Why the program or its parts do not work well? What are the effects and consequences of the program? Is this progra...
متن کاملIAG Study Group 4 . 1 ( 2003 - 2007 ) : Pseudolite Applications in Positioning and Navigation Final Report
Applications of artificial satellites in positioning and navigation can be traced back to as early as the 1960’s, just shortly after the launch of the first artificial satellite in 1957, when the era of satellite geodesy started. Over the past four decades, particularly since the inception of the Global Positioning System (GPS), satellite-based positioning techniques have been playing an increa...
متن کامل